Generalization performance of spetro-temporal speech features

نویسنده

Martin Heckmann

چکیده

Introduction Despite the fact that the dynamic aspects of speech are very important, conventional speech features as Mel Ceptstral Coefficients (Mfccs) [1] and RelAtive SpecTrAl Perceptual Linear Predictive (Rasta-Plp) features [2] capture only stationary spectral information. We could previously show that a combination of conventional speech features with spectro-temporal speech features yields to improved recognition results in noisy speech [3, 4, 5]. We termed those latter features as Hierarchical Spectro-Temporal (Hist) features. They consist of two layers, the first capturing local spectro-temporal variations and the second integrating them into larger receptive fields (compare Fig. 1). This layout was inspired by a recently proposed system for visual object recognition [6]. On the first layer we apply ICA (Independent Component Analysis) and in the second layer we apply different learning algorithms, detailed below. Finally we use a Principal Component Analysis (PCA) to orthogonalize the features and further reduce their dimensionality followed by a Hidden Markov Model (HMM) for the recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Robust Speech Features and Acoustic Models for Speech Recognition

This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions. The study is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deployment in realistic environments. Towards this end, we examine a feature compensation approach and...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI

Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such as place of articulation. Selective attention can shape the processing and performance of speech p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Generalization performance of spetro-temporal speech features

نویسنده

چکیده

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

Robust Speech Features and Acoustic Models for Speech Recognition

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Speech Emotion Recognition Using Scalogram Based Deep Structure

Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI

عنوان ژورنال:

اشتراک گذاری